VERSANT - A NEW MODEL FOR MANAGING AND DISTRIBUTING CONTENT ON THE WORLD WIDE WEB

A NEW MODEL FOR MANAGING AND DISTRIBUTING CONTENT
ON THE WORLD WIDE WEB

The Versant Web Propagation Framework

Abstract

The Versant Web Propagation Framework offers an entirely new model for managing high volume Web content publishing and distribution. It has been adopted by Genuity, a Bechtel company, as the foundation technology for managing Web content across one of the highest bandwidth, redundant Internet backbones in the world. On top of the Framework, Genuity has built a product called The Reflector which manages access to replicated content throughout the network. Together, they provide a means for building the large scale, distributed, high performance, high availability Web sites that are essential for the realization of the industry's vision of electronic commerce.

Introduction

Not since atomic power was going to be "too cheap to meter" has any technology held such explosive potential as the World Wide Web. But the visions that Web boosters hold for global electronic commerce will remain only that, visions, unless some fundamental problems are addressed. Chief among these are issues of performance, reliability, availability, integrity, and scalability. Unless the medium matures to the point that these are taken for granted - as they are, for example, in today's telephony networks - the long- awaited and eagerly lusted-after throngs of buyers and sellers will never materialize.

This paper discusses the problems involved in building such a robust, scalable infrastructure for Web-based electronic commerce. It goes on to describe a new model for transmission management, content distribution and transaction management on the Web. This model is based on a distributed object database software architecture and a high bandwidth, redundant transmission backbone that is being built today. This framework radically changes the way information can be propagated and accessed across the World Wide Web. The result can be orders-of-magnitude improvements in performance, reliability, availability, integrity and scalability for high end Web-based publishing and transactioning applications. Versant believes this approach to content distribution and transaction management will dramatically accelerate the realization of a truly global electronic marketplace.

Electronic Commerce - The Vision

The vision of a vast electronic marketplace built on the World Wide Web is well understood: millions, one day perhaps billions, of users "enter" the marketplace through browser-enabled computers to buy goods and services from millions of vendors who have "set up shop" through their Web sites around the world. The timing, the character and the size of this marketplace differ according to the imagination, ambition, and self-interest of the visionary but most informed sources expect it to be quite large, ranging up to the hundreds of billions of dollars per year by the year 2000.

No matter how large it is or how soon it arrives, the character of this marketplace is changing dramatically. The original "Electronic Bulletin Board" Web with its static page publishing paradigm is already giving way to a "Shopping Mall" Web characterized by dynamic publishing with single, point-to-point transactioning. At the very leading edge, players are building the foundations of a "Commodities Exchange" Web where millions of participants will broker, auction and arbitrage for value in everything from wholesale electric power to financial instruments and airline tickets. Like the telephony system where a variety of services are resident in the network and available on demand (voice mail, call forwarding, virtual private network, etc.), this Commodities Exchange Web will be rich with transactioning support services, everything from intelligent roving agents and software rental for specialized transactions to credit authorization, escrow services, electronic bonding and collection services. One scenario of this evolution is depicted in Exhibit 1.

Exhibit 1: The evolution of Internet computing - increasing complexity and value.

[image]

Electronic Commerce - The Threat

But the emergence of such a world is by no means assured. Congestion and overload are already crippling the Web's performance and limiting its attractiveness. These problems have led such luminaries as Ethernet inventor Bob Metcalfe to predict an imminent collapse of the Web, crushed under the burden of its own success and the incremental, linear thinking of its builders. Such a collapse is presaged by the cumulative impact of five converging forces:

1) The number of users continues to grow at rates well above 100% per year and will continue to do so for several years;
2) Usage per user is climbing steadily as the Web continues to add more and more value and interest to existing users;
3) The density of content per user interaction is expanding rapidly as multimedia displaces messaging as the primary content of Web interaction;
4) The complexity of each user interaction is exploding, especially as the Web evolves from a static, single-site, page publishing medium to a transactional environment involving multiple parties, locations, programs and data sources. And finally;
5) The network services to support such usage are themselves exploding - everything from digital signatures and e-cash to component subscription and applet metering services - all contending for limited bandwidth, processor cycles and storage capacity.

These five factors compound one another to place exponential burdens on the transmission and processing substrates over which all e-commerce must pass. Napoleon's dictum, that, "Quantity has a quality all its own," was never so meaningful or portentious as it is here. Unless this problem is addressed, messianic visions for global e-commerce will soon morph into dystopian nightmares of "Internet Brownout," an online hell of data tones forever chirping busy signals, paralyzing waits, stultifyingly stale content, and vast electronic wastelands of broken links, dead pages, and abandoned sites. Kind of a Blade Runner future for burned out cyber-dilettantes.

Inadequacy of Existing Solutions

Existing solutions offer limited hope, for they are almost all premised on the assumption that the continued, linear application of conventional resources can address the problem. But look at the real problem as illustrated in Exhibit 2. The most successful sites on the Internet today are those that bring millions of customers to a single location every day. By dint of their popularity, they are able to sell more advertising thereby adding more features thereby attracting more users thereby.... in an upwardly-spiraling, seemingly virtuous circle of popularity, profit and growth.

Exhibit 2: Popular web sites attract millions of users a day but suffer from network congestion, processor overload and single point of failure.

[image]

But these and similarly successful sites are already the victims of three debilitating, inter- related and ultimately destabilizing forces: congestion getting into and out of the site; delay once inside; and vulnerability to single point of failure. Congestion getting into the site is traditionally treated by increasing the size of the pipe or the network connection leading to the site. Of course, unless this is simultaneously accompanied by an increase in the number of servers within the site, it overwhelms the site╣s processing capacity and results in Web server performance delays that, from the user's perspective, are indistinguishable from network congestion delays. And even if the number of servers are increased, the third problem, risk from single point of failure, far from being ameliorated, is actually aggravated.

Ever resourceful under the looming threat of diminished traffic and reduced profits, Web site owners have adopted the workaround of site mirroring to address these problems. Under this scheme, the content of one site is duplicated in one or many other sites. At first blush, site mirroring appears an ideal solution to the above cited problems of network bottlenecks, performance congestion and single point of failure.

But site mirroring spawns a gaggle of its own problems: the user must know that he needs to connect to a mirror site, not just the primary site; he must know how to connect; and unless the congestion problem is simply to be rolled over to the larger Internet "commons", he must know which site is closest, either through the closest physical network route (the one involving the least number of hops) or through the network route providing the greatest available bandwidth - whichever is most appropriate, depending on his application needs. This requires a presumption of omniscience on the part of the user that probably outstrips the insight, ingenuity and intrepidness of all but the most committed of cybernauts.

Undaunted, enterprising Web site managers have devised still more inventive workarounds. The most notable of these is the Round Robin Domain Name Service (RR-DNS) under which users are allowed to believe that they are connecting to a single site when in fact, the RR-DNS is connecting them to the mirrored sites - kind of a Wizard of Oz for Webdom - pay no attention to that little man behind the curtain.

But like simple site mirroring, RR-DNS suffers its own unique problems:

RR-DNS may direct the user to a site which is already busy;
It may direct the user to a site which is further away in network terms;
The re-directed site may be out of date with respect to other sites (RR-DNS does nothing to address synchrony of content in multiple sites - a profound problem for transactionally-intensive applications);
The site which hosts the RR-DNS process may itself be saturated;
The RR-DNS site may be unreachable due to a network failure which effectively takes all the duplicate sites off the air.

Rethinking the Requirements

One begins to understand the wisdom of the Greek myth of the Minotaur which grew two new heads for each one Theseus could chop off. How to slay such a beast? Let's reconsider for a moment what it is we're trying to accomplish. As we said above, for e- commerce to become more than just a high-tech vision, it has to look a lot more like the phone system in terms of the following:

Performance: Can users be assured of short and predictable waits? Are they consistent from one interaction to another and common for all players? If not, confidence in the integrity of time-critical transactions - the sine qua non of electronic commerce - will never emerge.
Reliability: The near-religious aphorism that exemplifies standards for the world's telephony industry is, "You've got to have a dial tone." Unless users and vendors have equal confidence in the reliability of datatone provisioning, they will never commit serious resources to the medium.
Availability: This is Reliability crossed with Performance. The site might be up but if you can't get in - or out - it is meaningless. Conversely, it might be fast but if it's unpredictable it's too risky to bet your business on. It's got to be there when you need it and responsive in the way you need it.
Integrity: Everybody using common data must have unquestioned assurance that it is the same everywhere, the more so as the Web expands, as data propagates widely, and as transactions become an ever-larger component. Any doubts about disparities in the currency or availability of data will cause the whole of the system to be suspect and therefore avoided.
Scalability: Exponential growth means that the overall infrastructure for Web commerce must scale hundreds - probably thousands - of times without interruption or degradation of services. Does anyone seriously believe that the linear approach being followed today will be able to do this?

It was only after the phone system delivered this level of service that it became the foundation for a global business system. Current approaches to building a similarly robust, scalable, high performance Web backbone fail this test because, fundamentally:

they lack intelligence about system-wide loading;
they do not maintain synchrony of data between duplicate sites;
they do not balance network congestion considerations with information about processor utilization;
they presume an intelligence on the part of the user that is unrealistic;
they approach an exponential problem with a linear solution; and,
they maintain a single-point-of-failure architecture.
If the industry's hopes for a truly large scale, distributed, content-rich, transactionally- intensive electronic marketplace are to be realized, we must re-think the way the underlying transmission, content publishing and transactioning infrastructures are built.

The Versant Web Propagation Framework

The Versant Web Propagation Framework offers a new model for managing high volume Web content publishing and distribution systems. It is built on the Versant Object Database Management System, the most widely deployed ODBMS in the world. (See the section below, "Why an Object Database?") The Versant Web Propagation Framework has been adopted by Genuity, a Bechtel company, as the foundation technology for managing Web content distribution across one of the highest bandwidth, redundant Internet backbones in the world today. On top of the Framework, Genuity has built a product called The Reflector which manages access to replicated content throughout the network. Together, they provide a means for building the large scale, distributed, high performance, high integrity, robust Web sites that are essential for the realization of the vision of electronic commerce.

At the highest level, the Web Propagation Framework enables Web sites to replicate any amount of Web content among multiple locations throughout the world. The content from each site is made available to users on a synchronous basis - all users see new or changed content at exactly the same time, no matter from which site it originated or which site they access. The Genuity Reflector directs user queries to the Web site that is closest to the user in network terms or which has the greatest available capacity at a given time. The result is a massively distributed, single system image of Web-resident data that delivers reliability, scalability and performance never before possible. A conceptual representation of the system is presented in Exhibit 3.

Exhibit 3: Sites propagate content to one another and all display it to users at the same time. The Reflectors send user requests to the site which is closest in network terms or which has the most available capacity.

[image]

The Framework utilizes the underlying distributed object database technology of the Versant ODBMS. Content is replicated as a policy matter which can be defined by Web site administrators and can range from single HTML pages or Java applets up to the data for individual transactions, entire volumes or even entire Web sites. The timing and events "triggering" the replication process are also policy matters and can be initiated automatically or driven manually, again, in the complete control of Web site administrators.

The propagation process works like this: Each site contains an arbitrary number of Web Views, virtually partitioned files and directories defined in the database depending on the change profile of the Web site content (for example, change often for some, change rarely for others) and the business rules of the site's owner. Working changes to any content in the site are made through the Web Views and are recorded to a "change object" in the database before they are displayed on the local site. This change object maintains information about operational directories and their contents and is common to all of the participants or "peers" in the system. Peer sites can be added at any time with complete transparency to the operation of existing sites.

When changes are initiated, their propagation is negotiated via a lightweight distributed two phase commit protocol which commits first the change list to common sites, then the content itself. The messages between sites regarding changes are persistent, ensuring that all sites maintain common state and content integrity is assured throughout. The Framework utilizes an n-way protocol whereby any site can act as a "master" in the propagation process and any site can act as a "slave." It provides an automatic, services- level mechanism for detecting and resolving conflicts regarding propagation timing among multiple sites. Knowledge and state of each of the participating sites is maintained at each site's database.

On top of the Web Replication Framework, Genuity has built The Reflector which itself can be replicated throughout the system and which is updated as to the content, congestion, and availability of all of the sites in the network. Requests for access to specific Web content can be forwarded to the site which is closest to the user in network terms or which has the highest available capacity. In the event of any sites failing, the Reflector automatically routes subsequent requests to sites which are alive, returning to the down sites once they are brought back on line. Finally, the entire mechanism and process is completely transparent to the user.

While the "manifestation" of content on peer sites can be timed according to the dictates of the application being accessed (a policy option of the Framework itself), the real value emerges when system wide synchrony is practiced. See, for example, Exhibit 4 which displays different classes of applications and the implications for integrity and transactional volume.

Exhibit 4: Different classes of applications require different levels of integrity and synchrony, partly depending on transaction volume. Policy decisions on site replication must reflect these constraints.

[image]

As users enter sites under the management of the Web Propagation Framework, their hyperlinking activities may take them into and out of as many as a dozen or more different Web sites. Say the user accesses a site for Directory services, then links to a corporate Web site to gather information. From there he goes to a site for time delayed stock quotes and finally moves to a near-realtime stock brokerage site.

The Reflector knows the user's network location as well as the location and status of each of the sites under management of the Web Propagation Framework. In managing each of these hyperlinked connections it is therefore able to direct them to the Web site that is closest to the user in network terms (involves the fewest number of hops) or which has the greatest available capacity. In any event, no matter which of many available sites the user is directed to, he sees exactly the same content as is displayed on every other site.

Exhibit 5: The Web Propagation Framework makes the same information available to multiple sites at the same time, greatly improving system reliability. Users access data locally, seeing dramatic improvements in perceived web site performance.

[image]

The consequences of this architecture are quite dramatic relative to existing alternatives, and are represented in Exhibit 5. Addressing the Requirements noted above, the architecture delivers the following:

Performance. Performance from the user's point of view is improved dramatically. This is due to three factors: 1) intelligent routing of user requests to the most appropriate sites; 2) reduced network contention at the entry point to popular locations; and 3) distribution of processing burdens across multiple locations and machines to those with the greatest available capacity.
Availability. Availability is substantially improved over single site alternatives, site mirroring and RR-DNS options as multiple sites with redundant access points and intelligent routing provide unlimited failsafe capabilities.
Improved vendor economics. Significant cost savings are realized by the application of many smaller machines in distributed locations versus the "Battlestar Gallactica" phenomenon that overtakes popular sites and leads to quickly diminishing performance per unit of money applied.
Unlimited scalability. There are no practical limits to the number of sites that can be managed through this approach. For vendors earning advertising revenues based on user hits, this architecture provides an almost infinitely elastic path to expansion without interrupting existing operations.
Complete transparency. Users never know that their requests are being directed to different sites at different times based on network and capacity considerations. All they see are dramatic improvements in speed and exactly the same information no matter where they are connected to.

Taken together, these can provide the most robust, highest performance, scalable Web site hosting, content distribution and distributed Web-based transaction management system available in the world today.

Why an Object Database?

But why an object database? Couldn't this be built with Relational? The answer is no. The Web simply doesn't exist without objects. It is saturated with them. The C++ programs, browser plug-ins, CORBA and DCOM/OLE messaging, Java applets, ActiveX components, and the content itself - the graphics, the audio, video, schematics - are all objects. The rules of engagement for transactions in cyberspace are being developed in object-oriented languages and exist in programmatic form as objects: What are the terms of the arbitrage? How is the escrow to be settled? What are the demographics of the current user? How can I build a portfolio management application from rented components and use it to run a Capital Asset Pricing Model analysis on my 401K accounts that are spread across eleven funds in six different management firms, then use the results to re-allocate my portfolio? The dynamism, distribution, flexibility and heterogeneity that are the very hallmark of the Web are only possible through the use of object technology.

But does this necessarily imply the need for object storage and management? Within three to five years there will be literally trillions of such objects floating around in cyberspace. All will need storage, managing, versioning, transacting, and metering - let's not forget we're going to have to make money somewhere! But trying to do this with relational technology that was invented in the early 1970's is like trying to cut up tennis balls to make them fit into envelopes - the two dimensional tables that are the paradigm for relational storage. It can be done but it's hard, messy, time consuming and error prone. Relational vendors call it "mapping". I not only exacts a huge overhead for every line of code written, it produces programs that are horrifically intractable - not what you need when Web life cycles are measured in dog years and your apps need to be updated every day.

Worse, what do you do when you want your tennis balls back? The technique is called "joining". Bring all the possible data to a central location, look up indexes in each table that point to foreign keys in other tables, join the tables together, cull through all the data to see what you need and what you don't need from the common set, and...are you still there? Your user isn't. He left hours ago in disgust. For complex transactions involving large sets of data, ODB users have reported publicly they're getting 1000X improvements in performance relative to RDBs. That 1000 times faster, not percent. With the way the Web is growing in terms of data volumes, users, complexity of transactions and content, you simply cannot cut up, stuff into envelopes, ship around and re-glue tennis balls fast enough to scale and still meet performance objectives.

Why Versant?

So why Versant? Versant was not only the first ODB vendor to go public, it is also the most widely deployed ODBMS in the world. Versant is the engine driving airline reservation systems, regional electric power grid management systems, global logistics management systems, advanced corporate billing systems, commodities trading systems, and other impossibly hard, mission-critical systems that simply couldn't get built with any other available technology. Versant has been cited as a benchmark standard for the world's telephony industry for building next generation network management, service activation and service deployment applications. These are the hardest applications on earth.

Not surprisingly, it is the same architecture that made Versant attractive to the world's telephony industry that makes it such an ideal fit for Web publishing, transaction and transmission management. The requirements for both environments in terms of distribution, scalability, performance, reliability and flexibility are almost identical, although the standards in telephony are still much higher than they are for the Web - remember, you╣ve got to have a dial tone.

Versant was designed in the late 1980's by some of the most prestigious RDB architects in the world, people who knew everything about high volume transactioning databases but also knew the world was no longer flat, simple, centralized or static. Its object granularity enables it to scale across 65,000 distributed databases, managing 281 trillion objects in each database. These are the credentials and this is the technology that allow us to demonstrate a new model for distributing and managing Web content in a manner that greatly improves the chances that the visions the industry holds for the Web can now be realized.

Conclusion

The promise for the Web as a medium for conducting global electronic commerce is simply breathtaking. But if those ambitions are to be realized, existing models for how to manage Web content, transactions, and transmission will have to change. Current solutions will simply not scale well enough and are already provoking predictions of collapse.

Versant has developed the Versant Web Propagation Framework as a foundation for building massively distributed, high performance, highly scalable Web site systems. It has been embraced by Genuity, a Bechtel company, which is using it to manage synchronous distribution of Web content across one of the highest bandwidth Internet backbones in the world. Versant welcomes the opportunity to work with industry participants to apply its technology to other similarly ambitious projects in pursuit of a truly global electronic marketplace.

1-800-VERSANT
Tel 415-329-7500
Fax 415-325-2380
e-mail info@versant.com